[SOUND]
This lecture is about using a time series
as context to potentially
discover causal topics in text.
In this lecture, we're going to continue
discussing Contextual Text Mining.
In particular, we're going to look
at the time series as a context for
analyzing text,
to potentially discover causal topics.
As usual, it started with the motivation.
In this case, we hope to use text
mining to understand a time series.
Here, what you are seeing is Dow Jones
Industrial Average stock price curves.
And you'll see a sudden drop here.
Right.
So one would be interested knowing
what might have caused the stock
market to crash.
Well, if you know the background, and
you might be able to figure it out if you
look at the time stamp, or there are other
data that can help us think about.
But the question here is can
we get some clues about this
from the companion news stream?
And we have a lot of news data
that generated during that period.
So if you do that we might
actually discover the crash.
After it happened,
at the time of the September 11 attack.
And that's the time when there
is a sudden rise of the topic
about September 11
happened in news articles.
Here's another scenario where we want
to analyze the Presidential Election.
And this is the time series that are from
the Presidential Prediction Market.
For example, I write a trunk of market
would have stocks for each candidate.
And if you believe one candidate that will
win then you tend to buy the stock for
that candidate, causing the price
of that candidate to increase.
So, that's a nice way to actual do
survey of people's opinions about
these candidates.
Now, suppose you see something
drop of price for one candidate.
And you might also want to know what
might have caused the sudden drop.
Or in a social science study, you might
be interested in knowing what method
in this election,
what issues really matter to people.
Now again in this case,
we can look at the companion news
stream and ask for the question.
Are there any clues in the news stream
that might provide insight about this?
So for example,
we might discover the mention of tax cut
has been increasing since that point.
So maybe,
that's related to the drop of the price.
So all these cases are special
cases of a general problem of joint
analysis of text and a time series
data to discover causal topics.
The input in this case is time series plus
text data that are produced in the same
time period, the companion text stream.
And this is different from
the standard topic models,
where we have just to text collection.
That's why we see time series here,
it serves as context.
Now, the output that we
want to generate is the topics
whose coverage in the text stream has
strong correlations with the time series.
For example, whenever the topic is
managing the price tends to go down, etc.
Now we call these topics Causal Topics.
Of course, they're not,
strictly speaking, causal topics.
We are never going to be able to
verify whether they are causal, or
there's a true causal relationship here.
That's why we put causal
in quotation marks.
But at least they are correlating
topics that might potentially
explain the cause and
humans can certainly further analyze such
topics to understand the issue better.
And the output would contain topics
just like in topic modeling.
But we hope that these topics are not
just the regular topics with.
These topics certainly don't have to
explain the data of the best in text, but
rather they have to explain
the data in the text.
Meaning that they have to reprehend
the meaningful topics in text.
Cement but also more importantly,
they should be correlated with external
hand series that's given as a context.
So to understand how we solve this
problem, let's first adjust to
solve the problem with reactive
topic model, for example PRSA.
And we can apply this to text stream and
with some extension like a CPRSA or
Contextual PRSA.
Then we can discover these
topics in the correlation and
also discover their coverage over time.
So, one simple solution is,
to choose the topics from
this set that have the strongest
correlation with the external time series.
But this approach is not
going to be very good.
Why?
Because
awareness pictured to the topics is
that they will discover by PRSA or LDA.
And that means the choice of
topics will be very limited.
And we know these models try to maximize
the likelihood of the text data.
So those topics tend to be the major
topics that explain the text data well.
aAnd they are not necessarily
correlated with time series.
Even if we get the best one, the most
correlated topics might still not be so
interesting from causal perspective.
So here in this work site here,
a better approach was proposed.
And this approach is called
Iterative Causal Topic Modeling.
The idea is to do an iterative
adjustment of topic,
discovered by topic models using
time series to induce a product.
So here's an illustration on
how this work, how this works.
Take the text stream as input and
then apply regular topic modeling
to generate a number of topics.
Let's say four topics.
Shown here.
And then we're going to use
external time series to assess
which topic is more causally related or
correlated with the external time series.
So we have something that rank them.
And we might think that topic one and
topic four are more correlated and
topic two and topic three are not.
Now we could have stopped here and
that would be just like what the simple
approached that I talked about earlier
then we can get to these topics and
call them causal topics.
But as I also explained that these
topics are unlikely very good
because they are general topics that
explain the whole text connection.
They are not necessary.
The best topics are correlated
with our time series.
So what we can do in this approach
is to first zoom into word level and
we can look into each word and
the top ranked word listed for each topic.
Let's say we take Topic 1
as the target examined.
We know Topic 1 is correlated
with the time series.
Or is at least the best that we could
get from this set of topics so far.
And we're going to look at the words
in this topic, the top words.
And if the topic is correlated
with the Time Series,
there must be some words that are highly
correlated with the Time Series.
So here, for example,
we might discover W1 and W3 are positively
correlated with Time Series, but
W2 and W4 are negatively correlated.
So, as a topic, and it's not good to mix
these words with different correlations.
So we can then for
the separate of these words.
We are going to get all the red words
that indicate positive correlations.
W1 and W3.
And
we're going to also get another sub topic.
If you want.
That represents a negatively
correlated words, W2 and W4.
Now, these subtopics, or these variations
of topics, based on the correlation
analysis, are topics that are still quite
related to the original topic, Topic 1.
But they are already deviating,
because of the use of time series
information for bias selection of words.
So then in some sense,
well we should expect so, some sense
more correlated with the time
series than the original Topic 1.
Because the Topic 1 has mixed words,
here we separate them.
So each of these two subtopics
can be expected to be better
coherent in this time series.
However, they may not be so
coherent as it mention.
So the idea here is to go back
to topic model by using these
each as a prior to further
guide the topic modeling.
And that's to say we ask our topic
models now discover topics that
are very similar to each
of these two subtopics.
And this will cause a bias toward more
correlate to the topics was a time series.
Of course then we can apply topic models
to get another generation of topics.
And that can be further ran to the base of
the time series to set after the highly
correlated topics.
And then we can further analyze
the components at work in the topic and
then try to analyze.word
level correlation.
And then get the even more
correlated subtopics that can be
further fed into the process as prior
to drive the topic of model discovery.
So this whole process is just a heuristic
way of optimizing causality and
coherence, and that's our ultimate goal.
Right?
So here you see the pure topic
models will be very good at
maximizing topic coherence,
the topics will be all meaningful.
If we only use causality test,
or correlation measure,
then we might get a set words that
are strongly correlate with time series,
but they may not
necessarily mean anything.
It might not be cementric connected.
So, that would be at the other extreme,
on the top.
Now, the ideal is to get the causal
topic that's scored high,
both in topic coherence and
also causal relation.
In this approach,
it can be regarded as an alternate
way to maximize both sine engines.
So when we apply the topic models
we're maximizing the coherence.
But when we decompose the topic
model words into sets
of words that are very strong
correlated with the time series.
We select the most strongly correlated
words with the time series.
We are pushing the model
back to the causal
dimension to make it
better in causal scoring.
And then, when we apply
the selected words as a prior
to guide a topic modeling, we again
go back to optimize the coherence.
Because topic models, we ensure the next
generation of topics to be coherent and
we can iterate when they're optimized
in this way as shown on this picture.
So the only I think a component that you
haven't seen such a framework is how
to measure the causality.
Because the rest is just talking more on.
So let's have a little bit
of discussion of that.
So here we show that.
And let's say we have a topic
about government response here.
And then we just talking more of we can
get coverage of the topic over time.
So, we have a time series, X sub t.
Now, we also have, are give a time series
that represents external information.
It's a non text time series, Y sub t.
It's the stock prices.
Now the the question
here is does Xt cause Yt?
Well in other words, we want to match
the causality relation between the two.
Or maybe just measure
the correlation of the two.
There are many measures that
we can use in this framework.
For example, pairs in correlation
is a common use measure.
And we got to consider time lag here so
that we can try to
capture causal relation.
Using somewhat past data and
using the data in the past
to try to correlate with the data on
points of y that represents the future,
for example.
And by introducing such lag, we can
hopefully capture some causal relation by
even using correlation measures
like person correlation.
But a common use, the measure for
causality here is Granger Causality Test.
And the idea of this test
is actually quite simple.
Basically you're going to have
all the regressive model to
use the history information
of Y to predict itself.
And this is the best we could
without any other information.
So we're going to build such a model.
And then we're going to add some history
information of X into such model.
To see if we can improve
the prediction of Y.
If we can do that with a statistically
significant difference.
Then we just say X has some
causal inference on Y,
or otherwise it wouldn't have causal
improvement of prediction of Y.
If, on the other hand,
the difference is insignificant and
that would mean X does not really
have a cause or relation why.
So that's the basic idea.
Now, we don't have time to explain
this in detail so you could read, but
you would read at this cited reference
here to know more about this measure.
It's a very convenient used measure.
Has many applications.
So next, let's look at some simple
results generated by this approach.
And here the data is
the New York Times and
in the time period of June
2000 through December of 2011.
And here the time series we used
is stock prices of two companies.
American Airlines and Apple and
the goal is to see if we inject
the sum time series contest,
whether we can actually get topics
that are wise for the time series.
Imagine if we don't use any input,
we don't use any context.
Then the topics from New York
times discovered by PRSA would be
just general topics that
people talk about in news.
All right.
Those major topics in the news event.
But here you see these topics are indeed
biased toward each time series.
And particularly if you look
at the underlined words here
in the American Airlines result,
and you see airlines,
airport, air, united trade,
or terrorism, etc.
So it clearly has topics that are more
correlated with the external time series.
On the right side,
you see that some of the topics
are clearly related to Apple, right.
So you can see computer, technology,
software, internet, com, web, etc.
So that just means the time series
has effectively served as a context
to bias the discovery of topics.
From another perspective,
these results help us on what people
have talked about in each case.
So not just the people,
what people have talked about,
but what are some topics that might be
correlated with their stock prices.
And so these topics can serve
as a starting point for
people to further look into issues and
you'll find the true causal relations.
Here are some other results from analyzing
Presidential Election time series.
The time series data here is
from Iowa Electronic market.
And that's a prediction market.
And the data is the same.
New York Times from May
2000 to October 2000.
That's for
2000 presidential campaign election.
Now, what you see here
are the top three words in significant
topics from New York Times.
And if you look at these topics, and they
are indeed quite related to the campaign.
Actually the issues
are very much related to
the important issues of
this presidential election.
Now here I should mention that the text
data has been filtered by using
only the articles that mention
these candidate names.
It's a subset of these news articles.
Very different from
the previous experiment.
But the results here clearly show
that the approach can uncover some
important issues in that
presidential election.
So tax cut, oil energy, abortion and
gun control are all known
to be important issues in
that presidential election.
And that was supported by some
literature in political science.
And also I was discussing Wikipedia,
right.
So basically the results show
that the approach can effectively
discover possibly causal topics
based on the time series data.
So there are two suggested readings here.
One is the paper about this iterative
topic modeling with time series feedback.
Where you can find more details
about how this approach works.
And the second one is reading
about Granger Casuality text.
So in the end, let's summarize
the discussion of Text-based Prediction.
Now, Text-based prediction
is generally very useful for
big data applications that involve text.
Because they can help us inform
new knowledge about the world.
And the knowledge can go beyond
what's discussed in the text.
As a result can also support
optimizing of our decision making.
And this has a wider spread application.
Text data is often combined with
non-text data for prediction.
because, for this purpose,
the prediction purpose,
we generally would like to combine
non-text data and text data together,
as much cruel as possible for prediction.
And so as a result during
the analysis of text and
non-text is very necessary and
it's also very useful.
Now when we analyze text data
together with non-text data,
we can see they can help each other.
So non-text data, provide a context for
mining text data, and
we discussed a number of techniques for
contextual text mining.
And on the other hand,
a text data can also help interpret
patterns discovered from non-text data,
and this is called a pattern annotation.
In general,
this is a very active research topic, and
there are new papers being published.
And there are also many open
challenges that have to be solved.
[MUSIC]

